Statistical Learning of Semitic Morphology Using Autosegmental Orthography
نویسنده
چکیده
Abstract The root and pattern system, as well as the system of reduplication, are essential to the morphological analysis of Arabic words. (McCarthy 1979, 1981) Few computational morphology systems have been designed to parse concatenative morphology, as well as roots and reduplication simultaneously, without the help of a dictionary. By using simple statistics, we show an algorithm that can learn both the concatenative morphology as well as the roots and template. This paper shows an approach that is analogous to the the tier-based autosegmental approach developed by Goldsmith (1976), and applied to Semitic languages in McCarthy (1979).
منابع مشابه
Edge-in association and OCP ‘violations’ in Tigrinya
An important issue in the application of autosegmental principles to Semitic morphology is the way the independent consonantal root is associated to the template provided by the morphology. The three most obvious proposals are from left to right, from right to left, and from the edges in toward the center. I argue that association in the Ethiopian Semitic language Tigrinya is from the edges in,...
متن کاملMorpho-syntactically Annotated Amharic Treebank
In this paper, we describe an ongoing project of developing a treebank for Amharic. The main objective of developing the treebank is to use it as an input for the development of a parser. Morphologically-rich Languages like Arabic, Amharic and other Semitic languages present challenges to the state-of-art in parsing. In such language morphemes play important functions in both morphology and syn...
متن کاملSyllable-Based Speech Recognition for Amharic
Amharic is the Semitic language that has the second large number of speakers after Arabic (Hayward and Richard 1999). Its writing system is syllabic with Consonant-Vowel (CV) syllable structure. Amharic orthography has more or less a one to one correspondence with syllabic sounds. We have used this feature of Amharic to develop a CV syllable-based speech recognizer, using Hidden Markov Modeling...
متن کاملLex Ical R Epr Esentation of M Ultiw or D Ex Pr Essions in M or Ph Ologically -com Plex Languages
In spite of the surging interest in multiword expressions (M WE s) in recent years, it is still unclear how such expressions should be stored in computational lexicons. This problem is amplified in morphologically-complex languages, where the unique properties of M WE s interact with non-trivial morphological processes. We propose an architecture for lexical representation of M WE s, augmented ...
متن کاملIdentifying Semitic Roots: Machine Learning with Linguistic Constraints
Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive “slots” into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005